Fault isolation method of industrial process based on regularization framework

ABSTRACT

Provided is a fault isolation method in industrial process based on regularization framework, including the steps of: collecting and filtering sample data in industrial process to obtain an available sample data set; establishing an objective function for fault isolation in industrial process with local and global regularization items; calculating the optimal solution to the objective function for fault isolation in industrial process by the available sample data set; obtaining a predicted classification label matrix according to the optimal solution to determine the fault information in the process. The method uses the local regularization item to make the nature of the optimal solution ideal, and uses the global regularization item to correct problem of low fault isolation precision caused by the local regularization item. Experiments show that the method is not only feasible but also provides high fault isolation precision and mining the potential information of labeled sample data.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the priority of Chinese patent application No.201510816035.7, filed on Nov. 19, 2015, which is incorporated herewithby reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention belongs to the technical field of industrialprocess monitoring, in particular relates to a fault isolation method ofindustrial process based on regularization framework.

2. The Prior Arts

The fault means that one or more characteristics or variables in thesystem deviate from the normal state to a great extent. In a broadsense, the fault can be explained as all abnormal phenomena resulting inunexpected characteristics in the system. Once the system has a fault,the performance of the system may be reduced to below the normal level,so it is difficult to achieve the expected result and function. Thefault which cannot be removed and solved in time may cause a productionaccident.

The industrial process monitoring technology is a discipline based onfault isolation technology and is used to conduct research on theenhancement of product quality, system reliability and devicemaintainability, having great significance for ensuring safe operationof complex industrial process.

The sample data generated in industrial process are mainly classifiedinto labeled sample data and unlabeled sample data. The labeled sampledata is usually difficult to acquire, because it is mainly restrained bythe production condition of the actual work site and often needslabeling by experts or experienced workers in the field concerned, whichis time-consuming and expensive. Therefore, the data generated inindustrial process contains less labeled sample data and mostly isunlabeled sample data. How to reasonably use labeled sample data andunlabeled sample data to reduce the cost of manually labeling sampledata becomes a hotspot of research on the fault isolation method basedon data driven in recent years. However, the information of the labeledsample data has not been mined fully so far, so how to enhance thegeneralization ability of a classifier as much as possible in a smallamount of labeled sample data not accurate enough and how to make fulluse of a large number of cheap unlabeled samples to enhance theprecision of fault isolation have become hotspots of research in thefault isolation field.

SUMMARY OF THE INVENTION

Aiming at the defects of the prior art, the present invention provides afault isolation method of industrial process based on regularizationframework.

The present invention has the following technical schemes.

A fault isolation method of industrial process based on regularizationframework comprises the steps of:

step 1: collecting the sample data in industrial process;

step 2: filtering the collected sample data to remove singular sampledata and retain available sample data; wherein the available sample dataincludes labeled sample data and unlabeled sample data; the labeledsample data is used by experienced experts or workers to differentiatethe characteristics of the collected data and respectively label thecollected data as normal sample data, fault sample data and categoriesof their corresponding fault states to enable these sample data to haveclassification labels; the unlabeled data is the data which is directlycollected but not labeled and not having classification label, whereinthe available sample data set is expressed as:

T=={(x ₁ ,y ₁), . . . (x _(l) ,y _(l))}∪{x _(l+1) , . . . x _(n) }; x_(j) ∈R ^(d) , j=1, . . . ,n  (1)

wherein d is the number of variables; n is the number of samples;x_(i)|_(i=1) ^(l) is the labeled sample data, and x_(i)|_(i=l+1) ^(n) isthe unlabeled data; y_(i)∈{1, 2, . . . , c}, i=1, . . . , l, wherein cis the category of the fault state, and l is the number of the labeledsamples;

step 3: establishing an objective function for fault isolation inindustrial process,

$\begin{matrix}{{J(F)} = {\min\limits_{F \in R^{n \times c}}{{tr}( {{( {F - Y} )^{T}{D( {F - Y} )}} + {\frac{\gamma}{n^{2}}F^{T}G\; F} + {F^{T}M\; F}} )}}} & (2)\end{matrix}$

wherein F is a predicted classification label matrix; tr is the tracesymbol of the matrix; D is a diagonal matrix, wherein the diagonalelements are D_(ii)=D_(l)>0, i=−1, . . . , l, D_(ii)=D_(u)≧0, and i=l+1,. . . , n; (F−Y)^(T)D(F−Y) is empirical loss used to measure thedifference value between predicted classification label and initialclassification label; γ is a regulation parameter;

$\frac{\gamma}{n^{2}}F^{T}G\; F$

is a global regularization item, and G is a global regularizationmatrix; F^(T)MF is a local regularization item, and M is a localregularization matrix; Y∈R^(n×c) is an initial classification labelmatrix, and the elements of Y are defined as follows:

$\begin{matrix}{Y_{ij} = \{ \begin{matrix}{1,} & \begin{matrix}{{{if}\mspace{14mu} x_{i}{\mspace{11mu} \;}{is}\mspace{14mu} {labeled}\mspace{14mu} {as}\mspace{14mu} {category}\mspace{14mu} j\mspace{14mu} {fault}\mspace{14mu} {state}},} \\{j\mspace{14mu} {is}\mspace{14mu} {one}\mspace{14mu} {of}\mspace{14mu} {category}\mspace{14mu} c\mspace{14mu} {fault}}\end{matrix} \\{0,} & {otherwise}\end{matrix} } & (3)\end{matrix}$

step 4: calculating the optimal solution F* for the objective functionfor fault isolation in industrial process shown in Formula (2) by theavailable sample data set;

step 5: obtaining the predicted classification label matrix by Formula(4) according to the optimal solution F* to determine the faultinformation in the process,

$\begin{matrix}{f_{i} = {\underset{l \leq j \leq c}{\arg \; \max}F_{ij}^{*}}} & (4)\end{matrix}$

wherein f_(i) is the predicted classification label of the sample pointx_(i); according to the fault isolation method of industrial processbased on regularization framework, step 4 includes the steps of:

step 4.1: obtaining a global regularization matrix G according to theimproved similarity measurement algorithm and k-nearest neighbor (KNN)classification algorithm.

wherein G can be calculated by Formula (5),

G=S−W∈R ^(n×n)  (5)

wherein Formula (5) is further improved by a regularized Laplacianmatrix to obtain Formula (6):

$\begin{matrix}{G = {{I - {S^{- \frac{1}{2}}{WS}^{- \frac{1}{2}}}} \in R^{n \times n}}} & (6)\end{matrix}$

wherein I is the unit matrix of k×k; S is a diagonal matrix, wherein thediagonal elements are

${S_{ii} = {\sum\limits_{j = 1}^{n}W_{ij}}},$

i=1, 2, . . . , n; W∈R^(n×n), and is a similarity matrix; W and thesample point x_(i)|_(i=1) ^(n) form an undirected weighted graph withthe vertex corresponding to the sample point and the edge W_(ij)corresponding to the similarity of the sample points x_(i)|_(i=1) ^(l)and x_(j)|_(j=1) ^(l); the precision of the final fault classificationis determined by the calculation method of W, W is calculated by themethod of local reconstruction using neighbor points of the sample pointx_(i), and the reconstruction error equation is as follows:

$\begin{matrix}{\sum\limits_{i = 1}^{n}{{x_{i} - {\sum\limits_{j = 1}^{k}{W_{ij}x_{ij}}}}}^{2}} & (7)\end{matrix}$

wherein

${{\sum\limits_{j = 1}^{k}W_{ij}} = 1},$

and the minimum value of Formula (7) is calculated to get W and then Gby Formula (5); the specific steps for calculating W are as follows:

step 4.1.1: obtaining the distance measurement between x_(i) and its kneighbor points by the improved distance Formula (8) to calculate thedistance between sample points, i.e., sample similarity measurement;

$\begin{matrix}{W_{ij} = {{d( {x_{i},x_{j}} )} = \frac{{x_{i} - x_{j}}}{\sqrt{{M(i)}{M(j)}}}}} & (8)\end{matrix}$

M(i) and M(j) respectively represent the average value of distancesbetween the sample point x_(i) and its k neighbors and the average valueof distances between the sample point x_(i) and its k neighbors;

step 4.1.2: converting Formula (8) to Formula (9) through kernelmapping;

$\begin{matrix}{{d( {x_{i},x_{j}} )} = \frac{\sqrt{K_{ii} - {2K_{ij}} + K_{jj}}}{\sqrt{\Delta}}} & (9)\end{matrix}$

wherein K_(ij)=Φ(x_(i))^(T)Φ(x_(j)), K_(ii)=Φ(x_(i))^(T)Φ(x_(i)),K_(jj)=Φ(x_(j))^(T)Φ(x_(j)), and K is Mercer kernel; the numerator√{square root over (K_(ii)−2K_(ij)+K_(jj))} of Formula (9) is obtainedby deducing the numerator ∥x_(i)−x_(j)∥ of Formula (8) through kernelmapping, i.e., ∥Φ(x_(i))−Φ(x_(j))∥=√{square root over(∥Φ(x_(i))−Φ(x_(j))∥²)}=√{square root over (K_(ii)−2K_(ij)+K_(jj))}; inthe denominator of Formula (9),

${\Delta = \frac{\sum\limits_{p = 1}^{k}{( {K_{ii} - K_{{ii}^{p}} - K_{i^{p}i} + K_{i^{p}i^{p}}} ){\sum\limits_{q = 1}^{k}( {K_{jj} - K_{{jj}^{p}} - K_{j^{p}j} + K_{j^{p}j^{p}}} )}}}{k^{2}}},$

wherein K_(ii) _(p) =Φ(x_(i))^(T)Φ(x_(i) ^(p)); K_(i) _(p) _(i)=Φ(x_(i)^(p))^(T)Φ(x_(i)); K_(i) _(p) _(i) _(p) =Φ(x_(i) ^(p))^(T)Φ(x_(i) ^(p));K_(jj) _(q) =Φ(x_(j))^(T)Φ(x_(j) ^(q)); K_(j) _(q) _(j)=Φ(x_(j)^(q))^(T)Φ(x_(j)); K_(j) _(q) _(j) _(q) =Φ(x_(j) ^(q))^(T)Φ(x_(j) ^(q));x_(i) ^(p) (p=1, 2 . . . k) is the p th neighbor point of x_(i); x_(j)^(q) (q=1, 2 . . . k) is the q th neighbor point of x_(j);

step 4.1.3: defining the sample similarity measurement, i.e., distancemeasurement between samples, by Formula (9) according to the labeleddata and the unlabeled data among the collected data, expressed byFormula (10):

$\begin{matrix}{{d( {x_{i},x_{j}} )} = \{ \begin{matrix}{{\sqrt{1 - {\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )}} - \alpha},} & \begin{matrix}{{{when}\mspace{14mu} x_{i}\mspace{14mu} {and}\mspace{14mu} x_{j}}\mspace{14mu}} \\{{are}\mspace{14mu} {labeled}\mspace{14mu} {identically}}\end{matrix} \\\sqrt{1 - {\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )}} & \begin{matrix}{{{when}\mspace{14mu} x_{i}\mspace{14mu} {and}\mspace{14mu} x_{j}\mspace{14mu} {are}\mspace{14mu} {unlabeled}},} \\{x_{j} \in {N_{i}\mspace{14mu} {or}\mspace{14mu} x_{i}} \in N_{j}}\end{matrix} \\\sqrt{{\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )},} & {otherwise}\end{matrix} } & (10)\end{matrix}$

wherein β is a control parameter depending on the distribution densityof the collected sample data points; α is a regulation parameter;

step 4.1.4: getting k neighbors of the sample x_(i) by the distancemeasurement defined in Formula (10) to obtain the neighbor domain N_(i)of x_(i);

step 4.1.5: reconstructing x_(i) by k neighbor points of the samplex_(i) to calculate the minimum value of x_(i) reconstruction error,i.e., the optimal similarity matrix W:

$\begin{matrix}{{\arg \min}{\sum\limits_{i = 1}^{n}{{{\Phi ( x_{i} )} - {\sum\limits_{x_{j} \in N_{i}}{W_{ij}{\Phi ( x_{j} )}}}}}^{2}}} & (11)\end{matrix}$

wherein Formula (7) is converted to Formula (11) through kernel mappingof sample points; ∥•∥ is an Euclidean norm; W_(ij) has two constraintconditions:

${{\sum\limits_{x_{j} \in N_{i}}W_{ij}} = 1},$

and W_(ij)=0 when x_(j)∉N_(i);

step 4.2: obtaining a local regularization matrix M;

step 4.3: obtaining the optimal solution F* of the objective function bymaking the partial derivative of the objective function J(F) for faultisolation in industrial process equal to 0;

$\begin{matrix}{\frac{\partial J}{\partial F} {_{F = F^{*}}{{= {{{2{D( {F^{*} - Y} )}} + {2\frac{\gamma}{n^{2}}{GF}^{*}} + {2{MF}}} = { 0\Rightarrow{( {D + {\frac{\gamma}{n^{2}}G} + M} )F^{*}}  = { {DY}\Rightarrow F^{*}  = {( {D + {\frac{\gamma}{n^{2}}G} + M} )^{- 1}{DY}}}}}};}}} & (12)\end{matrix}$

according to the fault isolation method of industrial process based onregularization framework, step 4.2 includes the steps of:

step 4.2.1: determining k neighbor points of the sample point x_(i)through Euclidean distance, and defining the set of the k neighborpoints as N_(i)={_(i) _(j) }_(j=1) ^(k), wherein x_(i) _(j) representsthe j th neighbor point of the sample point x_(i);

step 4.2.2: establishing a loss function expressed by Formula (13) tocause sample classification labels to be distributed smoothly,

$\begin{matrix}{{J( g_{i} )} = {{\sum\limits_{j = 1}^{k}( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}} + {\lambda \; {S( g_{i} )}}}} & (13)\end{matrix}$

wherein the first item is the sum of errors of the predictedclassification labels and actual classification labels of all samples; λis a regulation parameter; the second item S(g_(i)) is a penaltyfunction; the function g_(i): R^(m)→R, and

${{g_{i}(x)} = {{\sum\limits_{j = 1}^{d}{\beta_{i,j}{p_{j}(x)}}} + {\sum\limits_{j = 1}^{k}{\alpha_{i,j}{\varphi_{i,j}(x)}}}}},$

which enable each sample point to reach a classification label throughthe mapping:

f _(i) _(j) =g _(i)(x _(i) _(j) ), j=1,2, . . . ,k  (14)

wherein f_(i) _(j) is the classification label of the j th neighborpoint of the sample point x_(i);

${d = \frac{( {m + s - 1} )!}{{m!}{( {s - 1} )!}}},$

m is the dimension of x, and s is the partial derivative order ofsemi-norm; {p_(j)(x)}_(i=1) ^(d) constitutes polynomial space with theorder not less than s, and 2s>m; φ_(i,j)(x) is a Green function; β_(i,j)and φ_(i,j) are two coefficients of the Green function;

step 4.2.3: obtaining the estimated classification label loss of the setN_(i) of neighbor points of the sample point x_(i) by calculating theminimum value of the loss function established in step 4.2.2,

wherein for k dispersed sample data points, the minimum value of theloss function J(g_(i)(x)) can be estimated by Formula (15),

$\begin{matrix}{{J( g_{i} )} \approx {{\sum\limits_{j = 1}^{k}( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}} + {{\lambda\alpha}_{i}^{T}H_{i}\alpha_{i}}}} & (15)\end{matrix}$

wherein H_(i) is the symmetric matrix of k×k, and its (r,z) elements areH_(r,z)=φ_(i,z)(x_(i) _(r) ), α_(i)=[α_(i,1), α_(i,2), . . . ,α_(i,k)]∈R^(k) and β_(i)=[β_(i,1), β_(i,2), . . . ,β_(i,d-1)]^(T)∈R^(k);wherein for a smaller λ, the minimum value of the loss functionJ(g_(i)(x)) can be estimated by the label matrix to obtain the estimatedclassification label loss of the set N_(i) of neighbor points of thesample point x_(i):

J(g _(i))≈λF _(i) ^(T) M _(i) F _(i)  (16)

wherein F_(i)=[f_(i) ₁ , f_(i) ₂ , . . . , f_(i) _(k) ]∈R^(k)corresponds to the classification labels of k data in N_(i); M_(i) isthe upper left k×k subblock matrix of the inverse matrix of thecoefficient matrix and is calculated by Formula (17):

α_(i) ^(T)(H _(i) +λI)α_(i) =F _(i) ^(T) M _(i) F _(i)  (17)

step 4.2.4: collecting the estimated classification label losses of theneighbor domains {N_(i)}_(i=1) ^(n) of n sample points together toobtain the total estimated classification label loss, and calculatingthe minimum value of the total loss E(f), i.e., the classification labelof the sample data, so as to obtain the local regularization matrix M;the total estimated classification label loss is expressed by Formula(18),

$\begin{matrix}{{E(f)} \approx {\lambda {\sum\limits_{i = 1}^{n}{F_{i}^{T}M_{i}F_{i}}}}} & (18)\end{matrix}$

wherein f=[f₁, f₂, . . . , f_(n)]^(T)∈R^(n) is the vector of theclassification label; wherein when the coefficient λ, in Formula (18) isneglected, Formula (18) is converted to Formula (19):

$\begin{matrix}{{E(f)} \propto {\sum\limits_{i = 1}^{n}{F_{i}^{T}M_{i}F_{i}}}} & (19)\end{matrix}$

wherein according to the row selection matrix S_(i)∈R^(k×n),F_(i)=S_(i)f; wherein the elements S_(i)(u,v) in the u th row and thevth column of S_(i) can be defined by Formula (20):

$\begin{matrix}{{S_{i}( {u,v} )} = \{ \begin{matrix}{1,} & {{{if}\mspace{14mu} v} = i_{u}} \\{0,} & {otherwise}\end{matrix} } & (20)\end{matrix}$

wherein F_(i)=S_(i)f is substituted into Formula (20) to obtainE(f)∝f^(T)Mf, wherein

${M = {\sum\limits_{i = 1}^{n}{S_{i}^{T}M_{i}S_{i}}}},$

The present invention has the following beneficial effect: the faultisolation using a large number of cheap unlabeled data samples fortraining on the basis of a small number of labeled data samples caneffectively enhance the accuracy of fault isolation. To make full use ofknown labeled sample data, the method provided by the present inventionuses the local regularization item to make the optimal solution haveideal nature, and uses the global regularization item to remedy theproblem of insufficient fault isolation precision which may be caused bythe local regularization item due to less samples in the neighbor domainso as to make the classification label smooth. The fault isolationmethod uses a small number of labeled data samples to train the faultisolation model of the system and makes full use of statisticaldistribution and other information of a large number of unlabeled datasamples to enhance the generalization ability, overall performance andprecision of the fault isolation model. Experiments show that the methodprovided by the present invention is not only feasible but also provideshigh fault isolation precision. It can be seen from experiments that thefault isolation effect of the experiments depends on the proportion ofthe labeled sample data and model parameters to a great extent.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is the flow chart of the fault isolation method of industrialprocess based on regularization framework for one embodiment of thepresent invention.

FIG. 2 is the structural diagram of the hot galvanizing pickling wasteliquor treatment process for one embodiment of the present invention.

FIG. 3 is the flow chart of the hot galvanizing pickling waste liquortreatment process shown in FIG. 2.

FIG. 4a is the result graph of simulating 700 sampled test data withfault 1 after modeling by 5% labeled samples for one embodiment of thepresent invention.

FIG. 4b is the result graph of simulating 700 sampled test data withfault 1 after modeling by 10% labeled samples for one embodiment of thepresent invention.

FIG. 4c is the result graph of simulating 700 sampled test data withfault 1 after modeling by 15% labeled samples for one embodiment of thepresent invention.

FIG. 5a is the result graph of simulating 700 sampled test data withfault 2 after modeling by 5% labeled samples for one embodiment of thepresent invention.

FIG. 5b is the result graph of simulating 700 sampled test data withfault 2 after modeling by 10% labeled samples for one embodiment of thepresent invention.

FIG. 5c is the result graph of simulating 700 sampled test data withfault 2 after modeling by 15% labeled samples for one embodiment of thepresent invention.

FIG. 6a is the monitoring result graph of the influence of testing theregulation parameter γ=10⁻¹ on fault isolation performance for oneembodiment of the present invention.

FIG. 6b is the monitoring result graph of the influence of testing theregulation parameter γ=10¹ on fault isolation performance for oneembodiment of the present invention.

FIG. 6c is the monitoring result graph of the influence of testing theregulation parameter γ=10² on fault isolation performance for oneembodiment of the present invention.

FIG. 6d is the monitoring result graph of the influence of testing theregulation parameter γ=10³ on fault isolation performance for oneembodiment of the present invention.

FIG. 6e is the monitoring result graph of the influence of testing theregulation parameter γ=10⁴ on fault isolation performance for oneembodiment of the present invention.

FIG. 6f is the monitoring result graph of the influence of testing theregulation parameter γ=10⁵ on fault isolation performance for oneembodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

One embodiment of the present invention is detailed in combination withthe figures.

The fault isolation method of industrial process based on regularizationframework provided by the embodiment, as shown in FIG. 1, includes thesteps of:

step 1: collecting the sample data in industrial process;

step 2: filtering the collected sample data to remove singular sampledata and retain available sample data; wherein the available sample dataincludes labeled sample data and unlabeled sample data; the labeledsample data is used by experienced experts or workers to differentiatethe characteristics of the collected data and respectively label thecollected data as normal sample data, fault sample data and categoriesof their corresponding fault states to enable these sample data to haveclassification labels; the unlabeled data is the data which is directlycollected but not labeled and belongs to the sample data of theclassification label to be predicted, wherein the available sample dataset is expressed as:

T=={(x ₁ ,y ₁), . . . (x _(l) ,y _(l))}∪{x _(l+1) , . . . x _(n) }; x_(j) ∈R ^(d) , j=1, . . . ,n  (1)

wherein d is the number of variables; n is the number of samples;x_(i)|_(i=1) ^(l) is the labeled sample data, and x_(i)|_(i=l+1) ^(n) isthe unlabeled data; y_(i)∈{1, 2, . . . , c}, i=1, . . . , l, wherein cis the category of the fault state, and l is the number of the labeledsamples;

step 3: establishing an objective function for fault isolation inindustrial process,

$\begin{matrix}{{J(F)} = {\min\limits_{F \in \bullet^{n \times c}}{{tr}( {{( {F - Y} )^{T}{D( {F - Y} )}} + {\frac{\gamma}{n^{2}}F^{T}{GF}} + {F^{T}{MF}}} )}}} & (2)\end{matrix}$

wherein F is a predicted classification label matrix; tr is the tracesymbol of the matrix; D is a diagonal matrix, wherein the diagonalelements are D_(ii)=D_(l)>0, i=−1, . . . , l, D_(ii)=D_(u)≧0, and i=l+1,. . . , n, and the concrete values of D_(l) and D_(u) are selectedartificially based on the experience; (F−Y)^(T)D(F−Y) is empirical lossused to measure the difference value between predicted classificationlabel and initial classification label; γ is a regulation parameter tobe determined by test;

$\frac{\gamma}{n^{2}}F^{T}{GF}$

is a global regularization item, and G is a global regularizationmatrix; F^(T)MF is a local regularization item, and M is a localregularization matrix; Y∈R^(n×c) is an initial classification labelmatrix, and the elements of Y are defined as follows:

$\begin{matrix}{Y_{ij} = \{ \begin{matrix}{{1,{{if}\mspace{14mu} x_{i}\mspace{14mu} {is}\mspace{14mu} {labeled}\mspace{14mu} {as}\mspace{14mu} {category}\mspace{14mu} j\mspace{14mu} {fault}\mspace{14mu} {state}},}\;} \\{j\mspace{14mu} {is}\mspace{14mu} {one}\mspace{14mu} {of}\mspace{14mu} {category}\mspace{14mu} c\mspace{14mu} {fault}\mspace{14mu} {states}} \\{0,{otherwise}}\end{matrix} } & (3)\end{matrix}$

step 4: calculating the optimal solution for the objective function forfault isolation in industrial process by the available sample data set;

step 4.1: obtaining a global regularization matrix G according to theimproved similarity measurement algorithm and KNN (k-Nearest Neighbor)classification algorithm,

wherein in the fault isolation process, labeled sample data is only inthe minority, and sufficient fault isolation precision cannot be ensuredby the unconstrained optimization problem of the minimization standardframework, so some labeled samples are required to direct the solving ofF. The global regularization item ∥f∥_(I) ² reflects the inherentgeometric distribution information of p(x). p(x) is the distributionprobability of samples, p(y|x) is the conditional probability with theclassification label of y under the condition that the sample x isknown, and samples distributed more intensively are most likely to havesimilar classification labels, that is if x₁ and x₂ are adjacent,p(y|x₁)≈p(y|x₂), x₁ and x₂ have similar classification labels. In otherwords, p(y|x) shall be very smooth under geometric properties withinp(x). ∥f∥_(I) ² is a Riemann integral with the form as follows:

f_(I)² = ∫_(x ∈ M) ∇_(M)f²p(x)

wherein f is a real-valued function; M represents low-dimensional datamanifold, ∇_(M)f is the gradient of f to M, and ∥f∥_(I) ² reflects thesmoothness of f. ∥f∥_(I) ² can be further approximately expressed as:

${f}_{I}^{2} = {\frac{\gamma}{n^{2}}F^{T}{GF}}$

wherein G can be calculated by Formula (5),

G=S−W∈R ^(n×n)  (5)

wherein Formula (5) is further improved by a regularized Laplacianmatrix to obtain Formula (6):

$\begin{matrix}{G = {{I - {S^{- \frac{1}{2}}{WS}^{- \frac{1}{2}}}} \in R^{n \times n}}} & (6)\end{matrix}$

wherein I is the unit matrix of k×k; S is a diagonal matrix, wherein thediagonal elements are

${S_{ii} = {\sum\limits_{j = 1}^{n}W_{ij}}},$

i=1, 2, . . . m n; W∈R^(n×n), and is a similarity matrix; W and thesample point x_(i)|_(i=1) ^(n) form an undirected weighted graph withthe vertex corresponding to the sample point and the edge W_(ij)corresponding to the similarity of the sample points x_(i)|_(i=1) ^(n)and x_(j)|_(j=1) ^(n) the precision of the final fault classification isdetermined by the calculation method of W, W is calculated by the methodof local reconstruction using neighbor points of the sample point x_(i),and the reconstruction error equation is as follows:

$\begin{matrix}{\sum\limits_{i = 1}^{n}{{x_{i} - {\sum\limits_{j = 1}^{k}{W_{ij}x_{ij}}}}}^{2}} & (7)\end{matrix}$

wherein

${{\sum\limits_{i = 1}^{k}W_{ij}} = 1},$

and the minimum value of Formula (7) is calculated to get W and then Gby Formula (5); the specific steps for calculating W are as follows:

step 4.1.1: obtaining the distance measurement between x_(i) and its kneighbor points by the improved distance formula (8) to calculate thedistance between sample points, i.e., sample similarity measurement;

$\begin{matrix}{W_{ij} = {{d( {x_{i},x_{j}} )} = \frac{{x_{i} - x_{j}}}{\sqrt{{M(i)}{M(j)}}}}} & (8)\end{matrix}$

M(i) and M(j) respectively represent the average value of distancesbetween the sample point x_(i) and its k neighbors and the average valueof distances between the sample point x_(j) and its k neighbors;

step 4.1.2: converting Formula (8) to Formula (9) through kernelmapping;

$\begin{matrix}{{d( {x_{i},x_{j}} )} = \frac{\sqrt{K_{ii} - {2K_{ij}} + K_{jj}}}{\sqrt{\Delta}}} & (9)\end{matrix}$

wherein K_(ij)=Φ(x_(i))^(T)Φ(x_(j)), K_(ii)=Φ(x_(i))^(T)Φ(x_(i)),K_(jj)=Φ(x_(j))^(T)Φ(x_(j)), and K is Mercer kernel; the numerator√{square root over (K_(ii)−2K_(ij)+K_(jj))} of Formula (9) is obtainedby deducing the numerator ∥x_(i)−x_(j)∥ of Formula (8) through kernelmapping, i.e., ∥Φ(x_(i))−Φ(x_(j))∥=√{square root over(∥Φ(x_(i))−Φ(x_(j))∥²)}=√{square root over (K_(ii)−2K_(ij)+K_(jj))}; inthe denominator of Formula (9),

$\Delta = \frac{\begin{matrix}{\sum\limits_{p = 1}^{k}( {K_{ii} - K_{{ii}^{p}} - K_{i^{p_{i}}} + K_{i^{p}i^{p}}} )} \\{\sum\limits_{q = 1}^{k}( {K_{jj} - K_{{jj}^{p}} - K_{j^{p}j} + K_{j^{p}j^{p}}} )}\end{matrix}}{k^{2}}$

which is obtained by deducing the denominator of Formula (8) throughkernel mapping, and the specific deducing process is as follows: giventhat

${{M(i)} = {{\frac{1}{k}( {\sum\limits_{p = 1}^{k}{{x_{i} - x_{i}^{p}}}} )\mspace{14mu} {and}\mspace{14mu} {M(j)}} = {\frac{1}{k}( {\sum\limits_{q = 1}^{k}{{x_{j} - x_{j}^{q}}}} )}}},$

the following Formula can be obtained:

${{M(i)}{M(j)}} = {{\lbrack {\frac{1}{k}( {\sum\limits_{p = 1}^{k}{{x_{i} - x_{i}^{p}}}} )} \rbrack \lbrack {\frac{1}{k}( {\sum\limits_{q = 1}^{k}{{x_{j} - x_{j}^{q}}}} )} \rbrack} = {{\frac{ {\sum\limits_{p = 1}^{k}{\lbrack {( {x_{i} - x_{i}^{p}} )^{T}( {x_{i} - x_{i}^{p}} )} \rbrack {\sum\limits_{q = 1}^{k}{\lbrack {x_{j} - x_{j}^{q}} )^{T}( {x_{j} - x_{j}^{q}} )}}}} \rbrack}{k^{2}}\overset{kernelized}{}\frac{\sum\limits_{p = 1}^{k}{( {K_{ii} - K_{{ii}^{p}} - K_{i^{p}i} + K_{i^{p}i^{p}}} ){\sum\limits_{q = 1}^{k}( {K_{jj} - K_{{jj}^{p}} - K_{j^{p}j} + K_{j^{p}j^{p}}} )}}}{k^{2}}} = \Delta}}$

wherein K_(ii) _(p) =Φ(x_(i))^(T)Φ(x_(i) ^(p)); K_(i) _(p) _(i)=Φ(x_(i)^(p))^(T)Φ(x_(i)); K_(i) _(p) _(i) _(p) =Φ(x_(i) ^(p))^(T)Φ(x_(i) ^(p));K_(jj) _(q) =Φ(x_(j))^(T)Φ(x_(j) ^(q)); K_(j) _(q) _(j)=Φ(x_(j)^(q))^(T)Φ(x_(j)); K_(j) _(q) _(j) _(q) =Φ(x_(j) ^(q))^(T)Φ(x_(j) ^(q));x_(i) ^(p) (p=1, 2 . . . k) is the p th neighbor point of x_(i); x_(j)^(q) (q=1, 2 . . . k) is the q th neighbor point of x_(j);

step 4.1.3: defining the sample similarity measurement, i.e., distancemeasurement between samples, by Formula (9) according to the labeleddata and the unlabeled data among the collected data, expressed byFormula (10):

$\begin{matrix}{{d( {x_{i},x_{j}} )} = \{ \begin{matrix}{{\sqrt{1 - {\exp ( \frac{{{x_{i} - x_{j}}}^{2}}{\beta} )}} - \alpha},{{when}\mspace{14mu} x_{i}\mspace{14mu} {and}\mspace{14mu} x_{j}\mspace{14mu} {are}\mspace{14mu} {labeled}\mspace{20mu} {identically}}} \\\begin{matrix}{{\sqrt{1 - {\exp ( \frac{{{x_{i} - x_{j}}}^{2}}{\beta} )}}{when}\mspace{14mu} x_{i}\mspace{14mu} {and}\mspace{14mu} x_{j}\mspace{14mu} {are}\mspace{14mu} {u{nlabeled}}},} \\{x_{j} \in {N_{i}\mspace{14mu} {or}\mspace{14mu} x_{i}} \in N_{j}}\end{matrix} \\{{\sqrt{\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )},{otherwise}}\mspace{14mu}}\end{matrix} } & (10)\end{matrix}$

wherein β is a control parameter depending on the distribution densityof the collected sample data points; α is a regulation parameter;

step 4.1.4: getting k neighbors of the sample x_(i) by the distancemeasurement defined in Formula (10) to obtain the neighbor domain N_(i)of x_(i);

step 4.1.5: reconstructing x_(i) by k neighbor points of the samplex_(i) to calculate the minimum value of x_(i) reconstruction error,i.e., the optimal similarity matrix W:

$\begin{matrix}{\arg \mspace{11mu} {m{in}}{\sum\limits_{i = 1}^{n}{{{\Phi ( x_{i} )} - {\sum\limits_{x_{j} \in N_{i}}{W_{ij}{\Phi ( x_{j} )}}}}}^{2}}} & (11)\end{matrix}$

wherein Formula (7) is converted to Formula (11) through kernel mappingof sample points; ∥•∥ is an Euclidean norm; W_(ij) has two constraintconditions:

${{\sum\limits_{x_{j} \in N_{i}}W_{ij}} = 1},$

and W_(ij)=0 when x_(j)∉N_(i);

step 4.2: obtaining a local regularization matrix M;

step 4.2.1: determining k neighbor points of the sample point x_(i)through Euclidean distance, and defining the set of the k neighborpoints, i.e., the neighbor domain of x_(i) is N_(i)={x_(i) _(j) }_(j=1)^(k), wherein x_(i) represents the j th neighbor point of the samplepoint x_(i);

step 4.2.2: establishing a loss function expressed by Formula (13) tocause sample classification labels to be distributed smoothly,

$\begin{matrix}{{J( g_{i} )} = {{\sum\limits_{j = 1}^{k}( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}} + {\lambda \; {S( g_{i} )}}}} & (13)\end{matrix}$

wherein the first item

$\sum\limits_{j = 1}^{k}( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}$

is the sum of errors of the predicted classification labels and actualclassification labels of all samples; λ is a regulation parameter; thesecond item S(g_(i)) is a penalty function; the function g_(i):R^(m)→R,and

${{g_{i}(x)} = {{\sum\limits_{j = 1}^{d}{\beta_{i,j}{p_{j}(x)}}} + {\sum\limits_{j = 1}^{k}{\alpha_{i,j}{\Phi_{i,j}(x)}}}}},$

which enable each sample point to reach a classification label throughthe mapping:

f _(i) _(j) =g _(i)(x _(i) _(j) ), j=1,2, . . . ,k  (14)

wherein f_(i) _(j) is the classification label of the j th neighborpoint of the sample point x_(i);

${d = \frac{( {m + s - l} )!}{{m!}{( {s - l} )!}}},$

m is the dimension of x, and s is the partial derivative order ofsemi-norm; {p_(j)(x)}_(i=1) ^(d) constitutes polynomial space with theorder not less than s, and 2s>m; φ_(i,j)(x) is a Green function; β_(i,j)and φ_(i,j) are two coefficients the Green function;

step 4.2.3: obtaining the estimated classification label loss of the setN_(i) of neighbor points of the sample point x_(i) by calculating theminimum value of the loss function established in step 4.2.2;

For k dispersed sample data points, the minimum value of the lossfunction J(g_(i)(x)) can be estimated by Formula (15),

$\begin{matrix}{{J( g_{i} )} \approx {{\sum\limits_{j = 1}^{k}\; ( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}} + {{\lambda\alpha}_{i}^{T}H_{i}\alpha_{i}}}} & (15)\end{matrix}$

wherein H_(i) is the symmetric matrix of k×k, and its (r,z) elements areK_(r,z)=φ_(i,z)(x_(i) _(r) ), α_(i)=[α_(i,1), α_(i,2), . . . ,α_(i,k)]∈R^(k) and β_(i)=[β_(i,1), β_(i,2), . . . ,β_(i,d-1)]^(T)∈R^(k);For a smaller λ (for example, λ=0.0001), the minimum value of the lossfunction J(g_(i)(x)) can be estimated by the classification label matrixto obtain the estimated classification label loss of the set N_(i) ofneighbor points of the sample point x_(i):

J(g _(i))≈λF _(i) ^(T) M _(i) F _(i)  (16)

wherein F_(i)=[f_(i) ₁ , f_(i) ₂ , . . . , f_(i) _(k) ]∈R^(k)corresponds to the classification labels of k data in N_(i); M_(i) isthe upper left k×k subblock matrix of the inverse matrix of thecoefficient matrix and is calculated by Formula (17):

α_(i) ^(T)(H _(i) +λI)α_(i) =F _(i) ^(T) M _(i) F _(i)  (17)

step 4.2.4: collecting the estimated classification label losses of theneighbor domains {N_(i)}_(i=1) ^(n) of n sample points together toobtain the total estimated classification label loss, which is expressedby Formula (18), and calculating the minimum value of the total lossE(f), i.e., the classification label of the sample data, so as to obtainthe local regularization matrix M; the total estimated classificationlabel loss is expressed by Formula (18),

$\begin{matrix}{{E(f)} \approx {\lambda {\sum\limits_{i = 1}^{n}\; {F_{i}^{T}M_{i}F_{i}}}}} & (18)\end{matrix}$

wherein f=[f₁, f₂, . . . , f_(n)]^(T)∈R^(n) is the vector of theclassification label, wherein when the coefficient λ in Formula (18) isneglected, Formula (18) is converted to Formula (19):

$\begin{matrix}{{E(f)} \propto {\sum\limits_{i = 1}^{n}\; {F_{i}^{T}M_{i}F_{i}}}} & (19)\end{matrix}$

wherein according to the row selection matrix S_(i)∈R^(k×n),F_(i)=S_(i)f; wherein the elements S_(i)(u,v) in the u th row and the vth column of S_(i) can be defined by Formula (20):

$\begin{matrix}{{S_{i}( {u,v} )} = \{ \begin{matrix}{1,} & {{{if}\mspace{14mu} v} = i_{u}} \\{0,} & {else}\end{matrix} } & (20)\end{matrix}$

wherein F_(i)=S_(i)f is substituted into Formula (20) to obtainE(f)∝f^(T)Mf, wherein

${M = {\sum\limits_{i = 1}^{n}\; {S_{i}^{T}M_{i}S_{i}}}};$

step 4.3: obtaining the optimal solution F* of the objective function bymaking the partial derivative of the objective function J(F) for faultisolation in industrial process;

$\begin{matrix}\begin{matrix}{{\frac{\partial J}{\partial F}F} = {F^{*} = {{{2{D( {F^{*} - Y} )}} + {2\frac{\gamma}{n^{2}}{GF}^{*}} + {2{MF}}} = 0}}} \\{ \Rightarrow {( {D + {\frac{\gamma}{n^{2}}G} + M} )F^{*}}  = {DY}} \\{ \Rightarrow F^{*}  = {( {D + {\frac{\gamma}{n^{2}}G} + M} )^{- 1}{DY}}}\end{matrix} & (12)\end{matrix}$

step 5: obtaining the predicted classification label matrix by Formula(4) according to the optimal solution F* to determine the faultinformation in the process.

$\begin{matrix}{f_{i} = {\underset{1 \leq j \leq c}{argmax}F_{ij}^{*}}} & (4)\end{matrix}$

wherein f_(i) is the predicted classification label of the sample pointx_(i).

To verify the effectiveness of the fault isolation method of industrialprocess based on regularization framework provided by the embodiment inisolating faults in industrial process with various fault types, theexperiment platform shown in FIG. 2 is used to conduct simulationexperiment.

The experiment platform shown in FIG. 2 is the hot galvanizing picklingwaste liquor treatment process. During hot galvanizing production, ironand steel workpieces are firstly degreased by alkali solution and thenetched by hydrochloric acid to remove rust and oxide film from thesurfaces of the iron workpieces.

Iron and steel react with hydrochloric acid to produce the followingferric salts:

FeO+2HCl→FeCl₂+H₂O Fe₂O₃+6HCl→2FeCl₃+3H₂O

5FeO+O₂+14HCl→4FeCl₃+FeCl₂+7H₂O Fe+2HCl→FeCl₂+H₂↑

The reaction shows that when iron and steel are pickled in hydrochloricacid, two ferric salts are produced: ferric chloride and ferrouschloride. In general condition, there are less pickling pieces terriblyrusty, so most of the products are ferrous chloride. As ferric saltsincrease, the concentration of the hydrochloric acid becomes lower,which is commonly referred to as failure. The usual method is to discardthe hydrochloric acid near failure, but this method is no longer useddue to awareness enhancement and control of environmental protection anddevelopment of recovery technology. In fact, the waste acid sometimeshas high concentration, and the discarded acid solution may contain moreacid than that taken out during usual cleaning after pickling.Therefore, this is an important pollution source and also a waste ofresources. The best method is to recycle acid solution.

During hot galvanizing production of the embodiment, the technologicalprocess for pickling waste acid is shown in FIG. 3 as follows: wasteacid produced during pickling in a hot galvanizing plant is input into awaste liquor tank with a stirrer, excessive ferrous powder is added toreplace ferric iron into ferrous iron, and then the replaced solution isfurther purified through solid-liquid separation to obtain waste acidsolution with ferrous chloride as the major ingredient; an appropriateamount of ferrous chloride solution is input into a reaction kettle, andiron red (or iron yellow) crystal seed is prepared by regulating certaintemperature, pH value, concentration, air input and stirring rate andcontrolling the time; crystal seed is condensation nuclei; ferrouschloride waste acid solution is transferred to generate iron red (oriron yellow) through oxidation by regulating temperature, pH value,concentration, air input and stirring rate and controlling the time; thegenerated iron red (or iron yellow) solution is treated throughsolid-liquid separation, solid powder is dried and then packaged intoproducts, ammonium chloride mother liquor in the liquid can be preparedinto ammonium chloride by-products through evaporation andcrystallization, and evaporation condensate water is returned to thesystem for use.

According to the above introduction and research on chemical andphysical changes, the experiment platform is mainly composed of a wasteliquor tank, a reaction kettle (overall reaction system), a filterpressing device, a pipeline valve, pumps, a control system, adistribution box, an electric control cabinet, a power supply cabinet,an air compressor, etc. Variables of the whole system include:temperature, pressure and liquid level in the reaction kettle, flowentering the reaction kettle, current of the transfer pump 1, current ofthe transfer pump 2, speed and current of the metering pump 1, speed andcurrent of the metering pump 2, speed and current of the metering pump3, speed and current of the metering pump 4, and current, voltage andspeed of the stirrer in the reaction kettle. The faults and fault typesof the hot galvanizing pickling waste liquor treatment process shown bythe experiment platform are shown in Table 1.

TABLE 1 Fault Description (Feature) of Hot Galvanizing Pickling WasteLiquor Treatment Process Fault Name Fault Type Fault 1: Transfer pump 1suddenly stalls due to fault Step Fault 2: Pipeline control valve failsStep

It is extremely difficult to obtain labeled sample data during actualindustrial process, so a small amount of such data is selected in theembodiment as training data which includes three states: normal, fault 1and fault 2.

In the embodiment, the first set of 700 sampled data with fault 1 isfirstly simulated. This set of test samples mainly includes normal dataand data with fault 1, which is specifically embodied in that the first300 sample points operate normally and then fault 1 occurs. To determinethe influence of different numbers of labeled data samples on monitoringresults, 5% labeled samples, 10% labeled samples and 15% labeled samplesare respectively selected by the embodiment for modeling and then theprocess monitoring results are observed. As shown in FIG. 4a , FIG. 4band FIG. 4 c, it can be found that for the model, normal characteristicscan be extracted from the first 300 data, and then the characteristicsof fault 1 can be extracted from the remaining 400 data, so it can bedetermined that the fault in the test sample occurs at the 300th samplepoint. During modeling, different numbers of labeled data samples andtheir corresponding different monitoring results are shown successivelyin FIG. 4a , FIG. 4b and FIG. 4 c.

It can be seen from FIG. 4a that under normal condition, the maximumcategory difference is approximately equal to 0.6, and although thecategory differentiation is not high, three types of characteristics canbe respectively extracted without overlap. The category difference isapproximately equal to 1 in case of a fault. Although the categorydifferentiation is very high, and fault 1 can be isolated, thecharacteristics of the normal data and the characteristics of fault 2have very low differentiation and have large overlap. As a whole, thesample point where a fault occurs can be found exactly by this set ofexperiments.

It can be seen from FIG. 4b that under normal condition, the maximumcategory difference is approximately equal to 0.7, and although thecategory differentiation is not high, only normal characteristics can beextracted, and fault 1 and fault 2 have serious overlap. The categorydifference is approximately equal to 0.9 in case of a fault. Althoughthe category differentiation is very high, and fault 1 can be isolated,the characteristics of the normal data and the characteristics of fault2 have very low differentiation and have large overlap. As a whole, thesample point where a fault occurs can be found exactly by this set ofexperiments.

It can be seen from FIG. 4c that under normal condition, the maximumcategory difference is approximately equal to 0.7, and although thecategory differentiation is not high, only normal characteristics can beextracted, and fault 1 and fault 2 have serious overlap. The categorydifference is approximately equal to 0.9 in case of a fault. Althoughthe category differentiation is very high, and fault 1 can be isolated,the characteristics of the normal data and the characteristics of fault2 have very low differentiation and have large overlap. As a whole, thesample point where a fault occurs can be found exactly by this set ofexperiments.

As shown in FIG. 4a , FIG. 4b and FIG. 4c , it can be found that for themodel, normal characteristics can be extracted from the first 300 dataof the test sample, and then the characteristics of fault 1 can beextracted from the remaining 400 data, so it can be determined that thefault in the test sample occurs at the 300th sample point. However, asthe number of the labeled sample data among the training data increases,the direction information increases, which is good for categorydetermination of unlabeled data. The category differentiation isincreasing gradually, i.e., the fault isolation effect is better, andthe influence of interference is less. The results shown in FIG. 4b andFIG. 4c are basically consistent, and it can be found that when thetraining data includes two labeled samples, the fault isolationperformance has basically become saturated, showing that when thelabeled samples achieve a certain quantity, the increase in the categorydifferentiation becomes slower even stable.

In the embodiment, the second set of 700 sampled data with fault 2 isthen simulated. This set of test samples mainly includes normal data anddata with fault 2, which is specifically embodied in that the first 350sample points operate normally and then fault 2 occurs. To determine theinfluence of different numbers of labeled data samples on monitoringresults, training data with 5% labeled samples, training data with 10%labeled samples and training data with 15% labeled samples arerespectively selected by the embodiment for modeling, and then theprocess monitoring results are observed, as shown in FIG. 5a , FIG. 5band FIG. 5c . It can be found that normal characteristics can beextracted from the first 350 data of the test sample, and then thecharacteristics of fault 2 can be extracted from the remaining 350 data,so it can be determined that the fault in the test sample occurs at the350th sample point. During modeling, different numbers of labeled datasamples and their corresponding different monitoring results are shownsuccessively in FIG. 5a , FIG. 5b and FIG. 5 c.

It can be seen from FIG. 5a that under normal condition, the maximumcategory difference is approximately equal to 0.5, and although thecategory differentiation is not high, three types of characteristics canbe respectively extracted without overlap. The maximum categorydifference is approximately equal to 0.8 in case of a fault. Althoughthe category differentiation is very high, and fault 2 can be isolated,the characteristics of the normal data and the characteristics of fault1 have very low differentiation and have large overlap. In case of afault, these characteristic curves fluctuate obviously and arevulnerable to interference. But when the 350th sample point is theturning point, the turning slope is larger. As a whole, the sample pointwherein a fault occurs can be found exactly by this set of experiments.

It can be seen from FIG. 5b that under normal condition, the maximumcategory difference is approximately equal to 0.8, and although thecategory differentiation is not high, only normal characteristics can beextracted, and fault 1 and fault 2 have serious overlap. The maximumcategory difference is approximately equal to 0.8 in case of a fault.Although the category differentiation is very high, and fault 2 can beisolated, the characteristics of the normal data and the characteristicsof fault 1 have very low differentiation and have large overlap. In caseof a fault, these characteristic curves fluctuate obviously and arevulnerable to interference. But when the 350th sample point is theturning point, the turning slope is larger. As a whole, the sample pointwhere a fault occurs can be found exactly by this set of experiments.

It can be seen from FIG. 5c that the diagnosis effect is basicallyconsistent with that shown in FIG. 5b ; under normal condition, themaximum category difference is approximately equal to 0.8, and althoughthe category differentiation is not high, only normal characteristicscan be extracted, and fault 1 and fault 2 have serious overlap. Themaximum category difference is approximately equal to 0.8 in case of afault. Although the category differentiation is very high, and fault 2can be isolated, the characteristics of the normal data and thecharacteristics of fault 1 have very low differentiation and have largeoverlap.

As shown in FIG. 5a , FIG. 5b and FIG. 5c , it can be found that for themodel, normal characteristics can be extracted from the first 350 dataof the test sample, and then the characteristics of fault 2 can beextracted from the remaining 350 data, so it can be determined that thefault in the test sample occurs at the 350th sample point. However, asthe number of the labeled samples among the training data increases, thedirection information increases, which is good for categorydetermination of unlabeled data. The category differentiation isincreasing gradually, i.e., the fault isolation effect is better, andthe influence of interference is less. The results shown in FIG. 5b andFIG. 5c are basically consistent, and it can be found that when thetraining data includes two labeled samples, the fault isolationperformance has basically become saturated, showing that when thelabeled samples achieve a certain quantity, the increase in the categorydifferentiation becomes slower even stable.

The experiments show that modeling by using the training data with 10%labeled samples can obtain better fault monitoring effect, which justconforms to the characteristic that it is difficult to obtain manylabeled samples in advance in fact. In fact, it is not easy to obtainfault information due to large harmfulness of faults, and the cost forlabeling is high, so the known labeled data obtained in fact is less.The fault isolation method of industrial process based on regularizationframework provided by the embodiment just can be used to obtain betterfault isolation results through minimal labeled samples. Therefore, thefault isolation method of industrial process based on regularizationframework provided by the embodiment is effective for process monitoringand fault isolation.

In the embodiment, the first set of test data with fault 1 and 10%labeled samples is then simulated, and used for observing the influenceof the regulation parameter γ on the fault isolation performance todetermine the optimal regulation parameter γ. This set of test samplesmainly includes normal data and data with fault 1, which is stillembodied in that the first 300 sample points operate normally and thenfault 1 occurs. The monitoring results of the influence of theregulation parameter γ on the fault isolation performance are shownsuccessively in FIG. 6a to FIG. 6 f.

When γ=10⁻¹, it can be seen from FIG. 6a that the maximum categorydifference is approximately equal to 0.9 under normal condition, and themaximum category difference is approximately equal to 1 in case of afault. The category differentiation is very high, but the shock is veryviolent and vulnerable to interference. Fault 1 can be monitored, butthe characteristics of the normal data and the characteristics of fault2 have very low differentiation and have large overlap. As a whole, theperformance at this time is poor.

When γ=10¹ and γ=10², it can be seen from FIG. 6b and FIG. 6c that themaximum category difference is approximately equal to 0.9 under normalcondition, the category differentiation is very high, and the shock isrelatively less. The maximum category difference is approximately equalto 1 in case of a fault. The category differentiation is very high, andnot only fault 1 can be monitored, but also these characteristic curvesfluctuate less and are less vulnerable to interference. As a whole, theperformance at this time is optimal.

When γ=10³ and γ=10⁴, it can be seen from FIG. 6d and FIG. 6e that themaximum category difference is approximately equal to 0.07 under normalcondition, and the category differentiation is very low, which is notgood for characteristic extraction. The maximum category difference isapproximately equal to 0.07 in case of a fault. The categorydifferentiation is very low, which is not good for characteristicextraction. The fault characteristics can be extracted, but theextraction is vulnerable to interference. As a whole, the performance atthis time is poor.

When γ=10⁵, it can be seen from FIG. 6f that fault 1 occurring at the300th sample point cannot be monitored at all, which may be caused bytoo small category difference, so the fault characteristics cannot beextracted, and the system cannot be applied at all at this time.

Conclusion: When 10¹<γ<10², the result with better effect can beobtained. However, when γ<10⁻¹, i.e., γ is too small, curves have bettereffect but violent shock and are vulnerable to interference. When10³<γ<10⁴, i.e., γ is appropriately large, the category difference issmall with less shock. When γ>10⁵, i.e., γ is too large, the categorycannot be differentiated.

The fault isolation method of industrial process based on regularizationframework provided by the embodiment uses the local regularization itemto make the optimal solution have ideal nature, and uses the globalregularization item to remedy the problem of insufficient faultisolation precision which may be caused by the local regularization itemdue to less samples in the neighbor domain so as to make theclassification label smooth. Experiments show that the fault isolationmethod of industrial process based on regularization framework providedby the embodiment is not only feasible but also provides high faultisolation precision. In addition, it can be deduced by experiments thatthe fault isolation effect of the method depends on the proportion ofthe labeled sample and model parameters to a great extent.

What is claimed is:
 1. A fault isolation method of industrial processbased on regularization framework, comprising the steps of: step 1:collecting sample data in industrial process; step 2: filtering thecollected sample data to remove singular sample data and retainavailable sample data, wherein the available sample data includeslabeled sample data and unlabeled sample data, the labeled sample datais used by experienced experts or workers to differentiate thecharacteristics of the collected data and respectively label thecollected data as normal sample data, fault sample data and categoriesof their corresponding fault states to enable these sample data to haveclassification labels; the unlabeled data is the data which is directlycollected but not labeled and not having classification labels, whereinthe available sample data set is expressed as:T=={(x ₁ ,y ₁), . . . (x _(l) ,y _(l))}∪{x _(l+1) , . . . x _(n) }; x_(j) ∈R ^(d) , j=1, . . . ,n  (1) wherein d is the number of variables;n is the number of samples; x_(i)|_(i=1) ^(l) is the labeled sampledata, and x_(i)|_(i=l+1) ^(n) is the unlabeled data; y_(i)∈{1, 2, . . ., c}, i=1, . . . , l, wherein c is the category of the fault state, andl is the number of the labeled samples; step 3: establishing anobjective function for fault isolation in industrial process with localand global regularization items, $\begin{matrix}{{J(F)} = {\min\limits_{F \in R^{n \times c}}{{tr}( {{( {F - Y} )^{T}{D( {F - Y} )}} + {\frac{\gamma}{n^{2}}F^{T}{GF}} + {F^{T}{MF}}} )}}} & (2)\end{matrix}$ wherein J(F) is the objective function for fault isolationin industrial process; F is a predicted classification label matrix; tris the trace symbol of the matrix; D is a diagonal matrix, wherein thediagonal elements are D_(ii)=D_(l)>0, i=1, . . . , l, D_(ii)=D_(u)≧0,and i=l+1, . . . , n; (F−Y)^(T)D(F−Y) is empirical loss used to measurethe difference value between predicted classification label and initialclassification label; γ is a regulation parameter;$\frac{\gamma}{n^{2}}F^{T}{GF}$ is a global regularization item, and Gis a global regularization matrix; F^(T)MF is a local regularizationitem, and M is a local regularization matrix; Y∈R^(n×c) is an initialclassification label matrix, and the elements of Y are defined asfollows: $\begin{matrix}{Y_{ij} = \{ {\begin{matrix}{1,} & {\begin{matrix}{{{if}\mspace{14mu} x_{i}\mspace{14mu} {is}\mspace{14mu} {labeled}\mspace{14mu} {as}\mspace{14mu} {category}\mspace{14mu} j\mspace{14mu} {fault}\mspace{14mu} {state}},} \\{j\mspace{14mu} {is}\mspace{14mu} {one}\mspace{14mu} {of}\mspace{14mu} {category}\mspace{14mu} c\mspace{14mu} {fault}}\end{matrix}\;} \\{0,} & {otherwise}\end{matrix};} } & (3)\end{matrix}$ step 4: calculating the optimal solution F* for theobjective function for fault isolation in industrial process shown inFormula (2) by the available sample data set; step 5: obtaining thepredicted classification label matrix by Formula (4) according to theoptimal solution F* to determine the fault information in the process,$\begin{matrix}{f_{i} = {\underset{1 \leq j \leq c}{argmax}F_{ij}^{*}}} & (4)\end{matrix}$ wherein f_(i) is the predicted classification label of thesample point x_(i).
 2. The fault isolation method of industrial processbased on regularization framework of claim 1, wherein step 4 comprisesthe steps of: step 4.1: obtaining a global regularization matrix Gaccording to the improved similarity measurement algorithm and k-nearestneighbor (KNN) classification algorithm, wherein G can be calculated byFormula (5),G=S−W∈R ^(n×n)  (5) wherein Formula (5) is further improved by aregularized Laplacian matrix to obtain Formula (6): $\begin{matrix}{G = {{I - {S^{- \frac{1}{2}}{WS}^{- \frac{1}{2}}}} \in R^{n \times n}}} & (6)\end{matrix}$ wherein I is the unit matrix of k×k; S is a diagonalmatrix, wherein the diagonal elements are${S_{ij} = {\sum\limits_{j = 1}^{n}\; W_{ij}}},$ i=1, 2, . . . , n;W∈R^(n×n) is a similarity matrix; W and the sample point x_(i)|_(i=1)^(n) form an undirected weighted graph with the vertex corresponding tothe sample point and the edge W_(ij) corresponding to the similarity ofthe sample points x_(i)|_(i=1) ^(n) and x_(j)|_(j=1) ^(b); the precisionof the final fault classification is determined by the calculationmethod of W, W is calculated by the method of local reconstruction usingneighbor points of the sample point x_(i), and the reconstruction errorequation is as follows: $\begin{matrix}{\sum\limits_{i = 1}^{n}\; {{x_{i} - {\sum\limits_{j = 1}^{k}\; {W_{ij}x_{ij}}}}}^{2}} & (7)\end{matrix}$ wherein ${{\sum\limits_{i = 1}^{k}\; W_{ij}} = 1},$ andthe minimum value of Formula (7) is calculated to get W and then G byFormula (5); the specific steps for calculating W are as follows: step4.1.1: obtaining the distance measurement between x_(i) and its kneighbor points by the improved distance formula (8) to calculate thedistance between sample points, i.e., sample similarity measurement;$\begin{matrix}{W_{ij} = {{d( {x_{i},x_{j}} )} = \frac{{x_{i} - x_{j}}}{\sqrt{{M(i)}{M(j)}}}}} & (8)\end{matrix}$ M(i) and M(j) respectively represent the average value ofdistances between the sample point x_(i) and its k neighbors and theaverage value of distances between the sample point x_(j) and its kneighbors; step 4.1.2: converting Formula (8) to Formula (9) throughkernel mapping; $\begin{matrix}{{d( {x_{i},x_{j}} )} = \frac{\sqrt{K_{ii} - {2K_{ij}} + K_{jj}}}{\sqrt{\Delta}}} & (9)\end{matrix}$ wherein K_(ij)=Φ(x_(i))^(T)Φ(x_(j)),K_(ii)=Φ(x_(i))^(T)Φ(x_(i)), K_(jj)=Φ(x_(j))^(T)Φ(x_(j)), and K isMercer kernel; the numerator √{square root over (K_(ii)−2K_(ij)+K_(jj))}of Formula (9) is obtained by deducing the numerator ∥x_(i)−x_(j)∥ ofFormula (8) through kernel mapping, i.e., ∥Φ(x_(i))−Φ(x_(j))∥=√{squareroot over (∥Φ(x_(i))−Φ(x_(j))∥²)}=√{square root over(K_(ii)−2K_(ij)+K_(jj))}; in the denominator of Formula (9),${\Delta = \frac{\sum\limits_{p = 1}^{k}\; {( {K_{ii} - K_{{ii}^{p}} - K_{i^{p}i} + K_{i^{p}i^{p}}} ){\sum\limits_{q = 1}^{k}\; ( {K_{jj} - K_{{jj}^{p}} - K_{j^{p}j} + K_{j^{p}j^{p}}} )}}}{k^{2}}},$wherein K_(ii) _(p) =Φ(x_(i))^(T)Φ(x_(i) ^(p)); K_(i) _(p) _(i)=Φ(x_(i)^(p))^(T)Φ(x_(i)); K_(i) _(p) _(i) _(p) =Φ(x_(i) ^(p))^(T)Φ(x_(i) ^(p));K_(jj) _(q) =Φ(x_(j))^(T)Φ(x_(j) ^(q)); K_(j) _(q) _(j)=Φ(x_(j)^(q))^(T)Φ(x_(j)); K_(j) _(q) _(j) _(q) =Φ(x_(j) ^(q))^(T)Φ(x_(j) ^(q));x_(i) ^(p) (p=1, 2 . . . k) is the p th neighbor point of x_(i); x_(q)^(j) (q=1, 2 . . . k) is the q th neighbor point of x_(j); step 4.1.3:defining the sample similarity measurement, i.e., distance measurementbetween samples, by Formula (9) according to the labeled data and theunlabeled data among the collected data, expressed by Formula (10):$\begin{matrix}{{d( {x_{i},x_{j}} )} = \{ \begin{matrix}{{\sqrt{1 - {\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )}} - \alpha},} & {{when}\mspace{14mu} x_{i}\mspace{14mu} {and}\mspace{14mu} x_{j}\mspace{14mu} {are}\mspace{14mu} {labeled}\mspace{14mu} {identically}} \\\sqrt{1 - {\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )}} & {{{when}\mspace{14mu} x_{i}\mspace{14mu} {and}\mspace{14mu} x_{j}\mspace{14mu} {are}\mspace{14mu} {{un}{labeled}}},{x_{j} \in {N_{i}\mspace{14mu} {or}\mspace{14mu} x_{i}} \in N_{j}}} \\{\sqrt{\exp ( {- \frac{{{x_{i} - x_{j}}}^{2}}{\beta}} )},} & {otherwise}\end{matrix} } & (10)\end{matrix}$ wherein β is a control parameter depending on thedistribution density of the collected sample data points; α is aregulation parameter; step 4.1.4: getting k neighbors of the samplex_(i) by the distance measurement defined in Formula (10) to obtain theneighbor domain N_(i) of x_(i); step 4.1.5: reconstructing x_(i) by kneighbor points of the sample x_(i) to calculate the minimum value ofx_(i) reconstruction error, i.e., the optimal similarity matrix W:$\begin{matrix}{{argmin}{\sum\limits_{i = 1}^{n}\; {{{\Phi ( x_{i} )} - {\sum\limits_{x_{j} \in N_{i}}\; {W_{ij}{\Phi ( x_{i} )}}}}}^{2}}} & (11)\end{matrix}$ wherein Formula (7) is converted to Formula (11) throughkernel mapping of sample points; ∥•∥ is an Euclidean noun; W_(ij) hastwo constraint conditions:${{\sum\limits_{x_{j} \in N_{i}}W_{ij}} = 1},$ and W_(ij)=0 whenx_(j)∉N_(i); step 4.2: obtaining a local regularization matrix M; step4.3: obtaining the optimal solution F* of the objective function bymaking the partial derivative of the objective function J(F) for faultisolation in industrial process equal to 0; $\begin{matrix}{ \frac{\partial J}{\partial F} |_{F = F^{*}} = {{{2{D( {F^{*} - Y} )}} + {2\frac{\gamma}{n^{2}}{GF}^{*}} + {2{MF}}} = { 0\Rightarrow{( {D + {\frac{\gamma}{n^{2}}G} + M} )F^{*}}  = { {DY}\Rightarrow F^{*}  = {( {D + {\frac{\gamma}{n^{2}}G} + M} )^{- 1}{{DY}.}}}}}} & (12)\end{matrix}$
 3. The fault isolation method of industrial process basedon regularization framework of claim 2, wherein step 4.2 comprises thesteps of: step 4.2.1: determining k neighbor points of the sample pointx_(i) through Euclidean distance, and defining the set of the k neighborpoints as N_(i)={x_(i) _(j) }_(j=1) ^(k), wherein x_(i) _(j) representsthe j th neighbor point of the sample point x_(i); step 4.2.2:establishing a loss function expressed by Formula (13) to cause sampleclassification labels to be distributed smoothly; $\begin{matrix}{{J( g_{i} )} = {{\sum\limits_{j = 1}^{k}\; ( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}} + {\lambda \; {S( g_{i} )}}}} & (13)\end{matrix}$ wherein the first item is the sum of errors of thepredicted classification labels and actual classification labels of allsamples; λ is a regulation parameter; the second item S(g_(i)) is apenalty function; the function g_(i):R^(m)→R, and${{g_{i}(x)} = {{\sum\limits_{j = 1}^{d}\; {\beta_{i,j}{p_{j}(x)}}} + {\sum\limits_{j = 1}^{k}\; {\alpha_{i,j}{\varphi_{i,j}(x)}}}}},$which enable each sample point to reach a classification label throughthe mapping:f _(i) _(j) =g _(i)(x _(i) _(j) ), j=1,2, . . . ,k  (14) wherein f_(i)_(j) is the classification label of the j th neighbor point of thesample point x_(i);${d = \frac{( {m + s - 1} )!}{{m!}{( {s - 1} )!}}},$m is the dimension of x, and s is the partial derivative order ofsemi-norm; {p_(j)(x)}_(i=1) ^(d) constitutes polynomial space with theorder not less than s, and 2s>m; φ_(i,j)(x) is a Green function; β_(i,j)and φ_(i,j) are two coefficients the Green function; step 4.2.3:obtaining the estimated classification label loss of the set N_(i) ofneighbor points of the sample point x_(i) by calculating the minimumvalue of the loss function established in step 4.2.2, wherein for kdispersed sample data points, the minimum value of the loss functionJ(g_(i)(x)) can be estimated by Formula (15), $\begin{matrix}{{J( g_{i} )} \approx {{\sum\limits_{j = 1}^{k}\; ( {f_{i_{j}} - {g_{i}( x_{i_{j}} )}} )^{2}} + {\lambda \; \alpha_{i}^{T}H_{i}\alpha_{i}}}} & (15)\end{matrix}$ wherein H_(i) is the symmetric matrix of k×k, and its(r,z) elements are H_(r,z)=φ_(i,z)(x_(i) _(r) ), α_(i)=[α_(i,1),α_(i,2), . . . , α_(i,k)]∈R^(k) and β_(i)=[β_(i,1), β_(i,2), . . . ,β_(i,d-1)]^(T)∈R^(k), wherein for a smaller λ, the minimum value of theloss function J(g_(i)(x)) can be estimated by the label matrix to obtainthe estimated classification label loss of the set N_(i) of neighborpoints of the sample point x_(i):J(g _(i))≈λF _(i) ^(T) M _(i) F _(i)  (16) wherein F_(i)=[f_(i) ₁ ,f_(i) ₂ , . . . , f_(i) _(k) ]∈R^(k) corresponds to the classificationlabels of k data in N_(i); M_(i) is the upper left k×k subblock of theinverse coefficient matrix and is calculated by Formula (17):α_(i) ^(T)(H _(i) +λI)α_(i) =F _(i) ^(T) M _(i) F _(i)  (17) step 4.2.4:collecting the estimated classification label losses of the neighbordomains {N_(i)}_(i=1) ^(n) of n sample points together to obtain thetotal estimated classification label loss, and calculating the minimumvalue of the total loss E(f), i.e., the classification label of thesample data, so as to obtain the local regularization matrix M; thetotal estimated classification label loss is expressed by Formula (18),$\begin{matrix}{{E(f)} \approx {\lambda {\sum\limits_{i = 1}^{n}\; {F_{i}^{T}M_{i}F_{i}}}}} & (18)\end{matrix}$ wherein f=[f₁, f₂, . . . , f_(n)]^(T)∈R^(n) is the vectorof the classification label, wherein when the coefficient λ in Formula(18) is neglected, Formula (18) is converted to Formula (19):$\begin{matrix}{{E(f)} \propto {\sum\limits_{i = 1}^{n}\; {F_{i}^{T}M_{i}F_{i}}}} & (19)\end{matrix}$ wherein according to the row selection matrixS_(i)∈R^(k×n), F_(i)=S_(i)f; wherein the elements S_(i)(u,v) in the u throw and the v th column of S_(i) can be defined by Formula (20):$\begin{matrix}{{S_{i}( {u,v} )} = \{ \begin{matrix}{1,} & {{{if}\mspace{14mu} v} = i_{u}} \\{0,} & {otherwise}\end{matrix} } & (20)\end{matrix}$ wherein F_(i)=S_(i)f is substituted into Formula (20) toobtain E(f)∝f^(T)Mf, wherein$M = {\sum\limits_{i = 1}^{n}\; {S_{i}^{T}M_{i}{S_{i}.}}}$